智能论文笔记

VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

Hongye Zeng , kang Zhou , Songhan Ge , Yuchong Gao , Jianhao Zhao , Shenghua Gao , Rui Zheng

分类：计算机视觉

2022-12-28

Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Text Mining-Based Patent Analysis for Automated Rule Checking in AEC

Zhe Zheng , Bo-Rui Kang , Qi-Tian Yuan , Yu-Cheng Zhou , Xin-Zheng Lu , Jia-Rui Lin

分类：自然语言处理 | 机器学习

2022-12-12

Automated rule checking (ARC), which is expected to promote the efficiency of the compliance checking process in the architecture, engineering, and construction (AEC) industry, is gaining increasing attention. Throwing light on the ARC application hotspots and forecasting its trends are useful to the related research and drive innovations. Therefore, this study takes the patents from the database of the Derwent Innovations Index database (DII) and China national knowledge infrastructure (CNKI) as data sources and then carried out a three-step analysis including (1) quantitative characteristics (i.e., annual distribution analysis) of patents, (2) identification of ARC topics using a latent Dirichlet allocation (LDA) and, (3) SNA-based co-occurrence analysis of ARC topics. The results show that the research hotspots and trends of Chinese and English patents are different. The contributions of this study have three aspects: (1) an approach to a comprehensive analysis of patents by integrating multiple text mining methods (i.e., SNA and LDA) is introduced ; (2) the application hotspots and development trends of ARC are reviewed based on patent analysis; and (3) a signpost for technological development and innovation of ARC is provided.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention

Quan Zhou , Huimin Shi , Weikang Xiang , Bin Kang , Xiaofu Wu , Longin Jan Latecki

分类：计算机视觉

2022-09-28

压缩高准确性卷积神经网络（CNN）的最新进展已经见证了实时对象检测的显着进步。为了加速检测速度，轻质检测器总是使用单路主链几乎没有卷积层。但是，单路径架构涉及连续的合并和下采样操作，始终导致粗糙和不准确的特征图，这些图形不利，无法找到对象。另一方面，由于网络容量有限，最近的轻质网络在表示大规模的视觉数据方面通常很弱。为了解决这些问题，本文提出了一个名为DPNET的双路径网络，并采用了实时对象检测的轻巧注意方案。双路径体系结构使我们能够与提取物相对于高级语义特征和低级对象详细信息。尽管DPNET相对于单路检测器几乎具有重复的形状，但计算成本和模型大小并未显着增加。为了增强表示能力，轻巧的自相关模块（LSCM）旨在捕获全局交互，只有很少的计算开销和网络参数。在颈部，LSCM扩展到轻质互相关模块（LCCM），从而捕获相邻尺度特征之间的相互依赖性。我们已经对Coco和Pascal VOC 2007数据集进行了详尽的实验。实验结果表明，DPNET在检测准确性和实施效率之间实现了最新的权衡。具体而言，DPNET在MS COCO Test-DEV上可实现30.5％的AP，Pascal VOC 2007测试集上的81.5％地图，MWITH近250万型号，1.04 GFLOPS，1.04 GFLOPS和164 fps和196 fps和196 fps，320 x 320输入图像的320 x 320输入图像。

translated by 谷歌翻译

What You See is What You Grasp: User-Friendly Grasping Guided by Near-eye-tracking

Shaochen Wang , Wei Zhang , Zhangli Zhou , Jiaxi Cao , Ziyang Chen , Kang Chen , Bin Li , Zhen Kan

分类：机器人

2022-09-13

这项工作提出了下一代人类机器人界面，只能通过视觉来推断和实现用户的操纵意图。具体而言，我们开发了一个集成了近眼跟踪和机器人操作的系统，以实现用户指定的操作（例如，抓取，拾取和位置等），在其中将视觉信息与人类的注意合并在一起，以创建为所需的映射机器人动作。为了实现视力指导的操纵，开发了一个头部安装的近眼跟踪设备，以实时跟踪眼球运动，以便可以确定用户的视觉注意力。为了提高抓地力性能，然后开发出基于变压器的GRASP模型。堆叠的变压器块用于提取层次特征，其中在每个阶段扩展了通道的体积，同时挤压了特征地图的分辨率。实验验证表明，眼球跟踪系统产生低的凝视估计误差，抓地力系统在多个握把数据集上产生有希望的结果。这项工作是基于凝视互动的辅助机器人的概念证明，该机器人具有巨大的希望，可以帮助老年人或上肢残疾在日常生活中。可在\ url {https://www.youtube.com/watch?v=yuz1hukyurm}上获得演示视频。

translated by 谷歌翻译

Coarse Retinal Lesion Annotations Refinement via Prototypical Learning

Qinji Yu , Kang Dang , Ziyu Zhou , Yongwei Chen , Xiaowei Ding

分类：计算机视觉

2022-08-30

基于深度学习的视网膜病变分割方法通常需要大量精确的像素注释数据。但是，概述病变区域的圆形或椭圆等粗糙注释的效率可能是像素级注释的六倍。因此，本文提出了一个注释细化网络，以将粗注释转换为像素级分割掩码。我们的主要新颖性是原型学习范式的应用来增强不同数据集或类型病变的概括能力。我们还引入了一个原型称量模块，以处理过度较小的病变的具有挑战性的病例。提出的方法对公开可用的IDRID数据集进行了培训，然后概括为公共DDR和我们的现实世界私人数据集。实验表明，我们的方法显着改善了初始的粗蒙版，并以较大的边缘优于非概率基线。此外，我们证明了原型称量模块在跨数据库和跨阶级设置中的实用性。

translated by 谷歌翻译

HTML版本

giMLPs: Gate with Inhibition Mechanism in MLPs

Cheng Kang , Jindich Prokop , Lei Tong , Huiyu Zhou , Yong Hu , Daneil Novak

分类：自然语言处理

2022-08-01

本文提出了一种新的模型架构，具有抑制MLP（GIMLP）的门。对CyClemlp（Gi-Cyclemlp）抑制的大门可以在Imagenet分类任务上产生同等的性能，并且还可以改善BERT，ROBERTA和DEBERTAV3型号关于两种新颖的技术。第一个是门控MLP，其中MLP和Trunk注意力输入之间的矩阵乘法在进一步调整模型的适应性中。第二个是抑制作用，它抑制或增强分支调节，并且随着抑制水平的增加，它提供了更大的肌肉特征限制。我们表明，就成像网分类的精度而言，抑制水平较低的GicyClemLP可能与原始CYCLEMLP具有竞争力。此外，我们还通过一项全面的实证研究表明，这些技术显着改善了微调NLU下游任务的性能。至于在Deberta（Gideberta）微调上具有抑制MLP的大门，我们发现它可以在NLU任务的大多数部分上取得吸引力的结果，而无需再进行任何额外的预处理。我们还发现，通过抑制栅极的使用，激活函数应具有短而光滑的负尾巴，而无关紧要的特征或受伤模型的特征可以适度抑制。对图像分类和增强自然语言微调的能力而没有任何额外预读的实验，对Imagenet和十二个语言的实验表明了GATE具有抑制作用的有效性。

translated by 谷歌翻译

Improving Distantly Supervised Relation Extraction by Natural Language Inference

Kang Zhou , Qiao Qiao , Yuepei Li , Qi Li

分类：自然语言处理 | 机器学习

2022-07-31

为了减少人际关系提取（RE）任务的注释，提出了遥远的监督方法，同时却在低性能方面挣扎。在这项工作中，我们提出了一个新颖的DSRE-NLI框架，该框架既考虑了现有知识库的遥远监督，又考虑了对其他任务的预读语言模型的间接监督。 DSRE-NLI通过半自动关系语言（SARV）机制为现成的自然语言推理（NLI）发动机充满电，以提供间接的监督并进一步巩固远处注释以使多型分类重新模型受益。基于NLI的间接监督仅获取一个从人类的关系模板作为每个关系的语义通用模板，然后模板集由高质量的文本模式富集，从遥远的注释的语料库中自动开采。通过两种简单有效的数据整合策略，培训数据的质量得到了显着提高。广泛的实验表明，所提出的框架可显着改善远距离监督的RE基准数据集上的SOTA性能（最高为F1的7.73％）。

translated by 谷歌翻译

Semi-supervised Deep Multi-view Stereo

Hongbin Xu , Zhipeng Zhou , Weitao Cheng , Baigui Sun , Hao Li , Wenxiong Kang

分类：计算机视觉 | 人工智能

2022-07-24

在受监督和无监督的设置的基于学习的多视图立体声（MV）中，已经看到了重大进展。为了结合其在准确性和完整性方面的优点，同时减少了对昂贵标签数据的需求，本文探讨了一种新型的基于学习的MVS问题的新型半监督设置，该设置只有MVS数据的一小部分与密集的深度地面真相相连。但是，由于方案和视图中灵活的设置的巨大变化，半监督的MVS问题（半MV）可能会破坏经典的半监督学习中的基本假设，该假设未标记的数据和标记的数据共享相同的标签空间和数据分布。为了解决这些问题，我们提出了一个新颖的半监督MVS框架，即SE-MVS。对于基本假设在MVS数据中起作用的简单情况，一致性正则化鼓励模型预测在原始样本和随机增强样品之间通过KL差异的限制保持一致。对于MVS数据中基本假设有冲突的进一步麻烦案例，我们提出了一种新型的样式一致性损失，以减轻分布差距引起的负面影响。未标记的样品的视觉样式被转移到标记的样品中以缩小差距，并且在原始标记的样品中使用标签进一步监督了生成样品的模型预测。 DTU，BlendenDMV，GTA-SFM和Tanks \＆Temples数据集的实验结果显示了该方法的出色性能。在骨干网络中使用相同的设置，我们提出的SE-MV优于其完全监督和无监督的基线。

translated by 谷歌翻译